{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 图像识别分类项目 - 学习笔记\n",
    " \n",
    "## 项目概述\n",
    " \n",
    "本项目致力于开发一个高效的图像识别分类系统，采用了著名的CIFAR-10数据集进行训练和测试。CIFAR-10数据集包含了10个类别的60000张32x32彩色图像，涵盖了飞机、汽车、鸟类、猫、鹿、狗、蛙类、马、船和卡车这10样常见物体。利用PyTorch深度学习框架，构建一个基于卷积神经网络（CNN）的图像分类模型，并通过一系列优化措施提升了模型的性能。\n",
    "\n",
    "## Requirements\n",
    "\n",
    "要运行本项目，需要满足以下软件和库的要求：\n",
    "\n",
    "### 基本要求\n",
    "\n",
    "Python：建议使用Python 3.8或更高版本，以确保与项目中使用的所有库的兼容性。\n",
    "\n",
    "PyTorch：一个开源的深度学习库，提供了丰富的神经网络组件和GPU加速功能。本项目依赖于PyTorch及其相关库torchvision。\n",
    "\n",
    "### 库依赖\n",
    "\n",
    "PyTorch：版本建议为1.8.0或更高，以支持项目中使用的神经网络架构和训练流程。\n",
    "\n",
    "torchvision：PyTorch的图像和视频处理库，版本建议为0.9.0或更高，用于数据增强和加载CIFAR-10数据集。\n",
    "\n",
    "TensorBoard：一个用于可视化深度学习模型训练过程的工具，版本建议为2.5.0或更高。\n",
    "\n",
    "pillow（PIL Fork）：一个图像处理库，用于图像的预处理和增强。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 项目结构与内容\n",
    " \n",
    "### 数据准备\n",
    " \n",
    "CIFAR-10数据集从官方渠道下载，并被划分为训练集和测试集。训练集包含50000张图像，用于模型的训练；测试集包含10000张图像，用于模型的评估。\n",
    " \n",
    "### 数据预处理\n",
    " \n",
    "在数据预处理阶段，对图像进行归一化处理，使其像素值在0到1之间，同时保持了图像的原始尺寸（32x32）。此外，还使用了PyTorch的`torchvision.transforms`模块进行数据增强，包括随机水平翻转和随机裁剪，以增加模型的泛化能力。具体的数据增强代码如下所示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision.transforms as transforms\n",
    " \n",
    "train_transform = transforms.Compose([\n",
    "    transforms.RandomHorizontalFlip(),\n",
    "    transforms.RandomCrop(32, padding=4),\n",
    "    transforms.ToTensor(),\n",
    "    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用上述train_transform对训练集和测试集进行处理："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision.datasets as datasets\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "# 预处理数据集\n",
    "train_data = datasets.CIFAR10(\"./dataset\", train=True, transform=train_transform, download=True)\n",
    "test_data = datasets.CIFAR10(\"./dataset\", train=False, transform=train_transform, download=True)\n",
    "\n",
    "# 训练集和测试集长度\n",
    "train_data_size = len(train_data)\n",
    "test_data_size = len(test_data)\n",
    "print(f\"训练数据集长度为：{train_data_size}\")\n",
    "print(f\"测试数据集长度为：{test_data_size}\")\n",
    "\n",
    "# 利用DataLoader加载数据集\n",
    "train_dataloader = DataLoader(train_data, batch_size=64)\n",
    "test_dataloader = DataLoader(test_data, batch_size=64)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型构建与优化、训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from torch import nn\n",
    "import torchvision\n",
    "from torch.optim.lr_scheduler import StepLR\n",
    "\n",
    "# 定义训练的设备\n",
    "device = torch.device(\"cuda\")\n",
    "\n",
    "# 搭建神经网络\n",
    "class VisionNet(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(VisionNet, self).__init__()\n",
    "        self.model = nn.Sequential(\n",
    "            nn.Conv2d(3, 64, 5, 1, 2),\n",
    "            nn.BatchNorm2d(64),  # 添加批量归一化层\n",
    "            nn.ReLU(),  # 非线性变换激活函数\n",
    "            nn.MaxPool2d(2),\n",
    "\n",
    "            nn.Conv2d(64, 64, 5, 1, 2),\n",
    "            nn.BatchNorm2d(64),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),\n",
    "\n",
    "            nn.Conv2d(64, 128, 5, 1, 2),\n",
    "            nn.BatchNorm2d(128),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),\n",
    "\n",
    "            nn.Conv2d(128, 128, 5, 1, 2),\n",
    "            nn.BatchNorm2d(128),\n",
    "            nn.ReLU(),\n",
    "            nn.MaxPool2d(2),\n",
    "\n",
    "            nn.Flatten(),\n",
    "            nn.Linear(128 * 2 * 2, 128),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(0.5),  # 添加Dropout层，丢弃率为0.5\n",
    "            nn.Linear(128, 10)\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.model(x)\n",
    "        return x\n",
    "\n",
    "model = VisionNet()\n",
    "model.to(device)\n",
    "\n",
    "# 简单验证模型结构\n",
    "if __name__ == '__main__':\n",
    "    input = torch.ones((64, 3, 32, 32))\n",
    "    output = model(input)\n",
    "    print(output.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 网络架构\n",
    "\n",
    "设计一个包含多个卷积层、池化层和全连接层的CNN模型。卷积层用于提取图像特征，池化层用于降低特征图的维度，全连接层用于分类。网络架构的具体细节可能根据实验和调优有所不同，但通常包括以下几个部分：\n",
    "    \n",
    "    卷积层：使用多个卷积核提取图像特征。\n",
    "    \n",
    "    池化层：如最大池化层，用于降低特征图的尺寸。\n",
    "    \n",
    "    全连接层：将特征图映射到类别空间。\n",
    "\n",
    "### 激活函数\n",
    "\n",
    "在每个卷积层和全连接层后，使用ReLU激活函数，以引入非线性特性。ReLU函数的形式为f(x) = max(0, x)，它有助于解决梯度消失问题，并加速模型的收敛。\n",
    "\n",
    "### 损失函数与优化器\n",
    "\n",
    "在训练过程中，选择交叉熵损失函数作为模型的损失函数。交叉熵损失函数常用于多分类问题，它能够衡量模型输出与真实标签之间的差异。SGD优化器以较小的学习率逐步更新模型参数，有助于找到全局最优解。\n",
    "\n",
    "损失函数和优化器的代码如下所示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.optim as optim\n",
    "\n",
    "# 损失函数\n",
    "loss_fn = nn.CrossEntropyLoss()\n",
    "loss_fn = loss_fn.to(device)\n",
    "\n",
    "# 优化器\n",
    "learning_rate = 1e-1\n",
    "optimizer = optim.SGD(model.parameters(), lr=learning_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 学习率调度器\n",
    " \n",
    "为了进一步提高模型的性能，采用学习率衰减策略。通过`torch.optim.lr_scheduler.StepLR`，设置每10个epoch学习率减半。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scheduler = StepLR(optimizer, step_size=10, gamma=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练与评估循环\n",
    "\n",
    "接下来，设置训练与评估的循环。在每个epoch中，模型首先在训练集上进行训练，然后在测试集上进行评估。\n",
    "\n",
    "### 训练步骤\n",
    "\n",
    "在训练步骤中，模型被设置为训练模式（model.train()），以确保Dropout层和Batch Normalization层按预期工作。然后，使用DataLoader迭代训练数据，每次取出一个batch的数据进行前向传播、计算损失、反向传播和优化。同时，添加tensorboard，以可视化训练过程。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 设置训练网络的一些参数\n",
    "# 记录训练的次数\n",
    "total_train_step = 0\n",
    "# 记录测试的次数\n",
    "total_test_step = 0\n",
    "\n",
    "# 添加tensorboard可视化训练过程\n",
    "from torch.utils.tensorboard import SummaryWriter\n",
    "writer = SummaryWriter(\"./new_logs_train\")\n",
    "cumulative_test_loss = 0\n",
    "cumulative_accuracy = 0\n",
    "num_test_batches = len(test_dataloader)\n",
    "\n",
    "# 监控时间\n",
    "import time\n",
    "start_time = time.time()\n",
    "\n",
    "# 训练的轮数\n",
    "epoch = 50"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 评估步骤\n",
    "\n",
    "在评估步骤中，模型被设置为评估模式（model.eval()），以关闭Dropout层和Batch Normalization层的训练特性。然后，使用DataLoader迭代测试数据，计算模型在测试集上的损失和准确率。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(epoch):\n",
    "    print(f\"\\n————————————第{i+1}轮训练开始————————————\")\n",
    "\n",
    "    # 训练步骤开始\n",
    "    model.train()\n",
    "    total_train_loss = 0\n",
    "    total_train_correct = 0\n",
    "    for data in train_dataloader:\n",
    "        imgs, targets = data\n",
    "        imgs = imgs.to(device)\n",
    "        targets = targets.to(device)\n",
    "        outputs = model(imgs)\n",
    "        loss = loss_fn(outputs, targets)\n",
    "\n",
    "        # 优化器优化模型\n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        total_train_step += 1\n",
    "        if total_train_step % 100 == 0:\n",
    "            end_time = time.time()\n",
    "            print(end_time - start_time)\n",
    "            print(f\"训练次数：{total_train_step}, Loss:{loss.item()}\")\n",
    "            writer.add_scalar(\"train_loss\", loss.item(), total_train_step)\n",
    "\n",
    "        # 计算正确预测的样本数\n",
    "        correct = (outputs.argmax(1) == targets).sum().item()\n",
    "        total_train_correct += correct\n",
    "\n",
    "        # 累加训练损失\n",
    "        total_train_loss += loss.item()\n",
    "\n",
    "    # 计算训练集上的平均Loss和正确率\n",
    "    avg_train_loss = total_train_loss / len(train_dataloader)\n",
    "    avg_train_accuracy = total_train_correct / len(train_dataloader.dataset)\n",
    "\n",
    "    # 记录训练集上的平均Loss和正确率到TensorBoard\n",
    "    writer.add_scalar(\"train_loss_avg\", avg_train_loss, total_train_step)\n",
    "    writer.add_scalar(\"train_accuracy_avg\", avg_train_accuracy, total_train_step)\n",
    "\n",
    "    # 更新学习率\n",
    "    scheduler.step()\n",
    "\n",
    "    # 测试步骤开始\n",
    "    model.eval()\n",
    "    total_test_loss = 0\n",
    "    total_accuracy = 0\n",
    "    with torch.no_grad():\n",
    "        for data in test_dataloader:\n",
    "            imgs, targets = data\n",
    "            imgs = imgs.to(device)\n",
    "            targets = targets.to(device)\n",
    "            outputs = model(imgs)\n",
    "            loss = loss_fn(outputs, targets)\n",
    "\n",
    "            # 累加每个批次的Loss和正确率\n",
    "            total_test_loss += loss.item()\n",
    "            accuracy = (outputs.argmax(1) == targets).sum()\n",
    "            total_accuracy +=accuracy\n",
    "\n",
    "        # 增加测试步骤计数器\n",
    "        total_test_step += 1\n",
    "        # 计算测试集的平均Loss和正确率\n",
    "        avg_test_loss = total_test_loss / len(test_dataloader)\n",
    "        avg_accuracy = total_accuracy / len(test_dataloader.dataset)\n",
    "        # 记录测试集的平均Loss和正确率到TensorBoard\n",
    "        writer.add_scalar(\"test_loss_avg\", avg_test_loss, total_test_step)\n",
    "        writer.add_scalar(\"test_accuracy_avg\", avg_accuracy, total_test_step)\n",
    "        # 重置累计值，为下一个epoch做准备\n",
    "        cumulative_test_loss = 0\n",
    "        cumulative_accuracy = 0\n",
    "        # 打印整体测试集上的Loss和正确率\n",
    "        print(f\"\\n整体测试集上的平均Loss: {avg_test_loss}\")\n",
    "        print(f\"整体测试集上的平均正确率: {avg_accuracy}\")\n",
    "\n",
    "    # 保存模型\n",
    "    torch.save(model.state_dict(), f\"./new_model_i/model_{i+1}.pth\")\n",
    "    print(\"模型已保存\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 模型保存\n",
    "\n",
    "在每个epoch结束后，将训练好的模型保存到磁盘上，以便将来进行进一步的分析或部署。\n",
    "\n",
    "### TensorBoard记录\n",
    "\n",
    "为了可视化训练过程中的损失和准确率，使用TensorBoard。在训练结束后，关闭TensorBoard的SummaryWriter。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "writer.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在终端输入以下指令打开TensorBoard查看logs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "tensorboard --logdir=new_logs_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型应用\n",
    "\n",
    "使用训练好的VisionNet模型对图像进行识别分类。\n",
    "\n",
    "首先，设置图像存放的目录和模型文件的路径。\n",
    "接着，定义图像转换流程，包括调整图像大小为32x32像素，并将其转换为张量格式以适应模型输入。\n",
    "然后，加载VisionNet模型，并通过load_state_dict方法导入之前保存的模型参数。将模型设置为评估模式后，定义包含10个类别名称的列表，以便将模型输出的类别索引映射到具体的类别名称。\n",
    "\n",
    "随后，使用glob模块遍历指定目录下的所有图像文件。\n",
    "对于每张图像，先打开并转换为RGB格式，再应用之前定义的转换流程，并添加批次维度以符合模型输入要求。\n",
    "在关闭梯度计算的情况下，将图像输入模型进行前向传播，得到预测输出。\n",
    "最后，通过argmax函数获取预测类别索引，并打印出图像路径和预测的类别名称。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import glob\n",
    "from PIL import Image\n",
    "\n",
    "# 设置图片目录和模型路径\n",
    "image_dir = \"./test_set/\"\n",
    "model_path = \"./new_model_i/model_50.pth\"\n",
    "\n",
    "# 定义图片转换\n",
    "transform = transforms.Compose([\n",
    "    transforms.Resize((32, 32)),\n",
    "    transforms.ToTensor()\n",
    "])\n",
    "\n",
    "# 加载网络模型\n",
    "model = VisionNet()\n",
    "model.load_state_dict(torch.load(model_path))\n",
    "model.eval()\n",
    "\n",
    "# 类别名称列表\n",
    "class_names = [\"airplane\", \"automobile\", \"bird\", \"cat\", \"deer\", \"dog\", \"frog\", \"horse\", \"ship\", \"truck\"]\n",
    "\n",
    "# 图片格式列表，根据需要添加或删除格式\n",
    "image_formats = [\"*.png\", \"*.jpg\", \"*.jpeg\"]\n",
    "\n",
    "# 获取所有匹配的图片文件路径\n",
    "image_files = []\n",
    "for fmt in image_formats:\n",
    "    image_files.extend(glob.glob(os.path.join(image_dir, fmt)))\n",
    "\n",
    "# 遍历每张图片\n",
    "for image_path in image_files:\n",
    "    image = Image.open(image_path)\n",
    "    image = image.convert('RGB')\n",
    "    \n",
    "    # 应用转换\n",
    "    image = transform(image)\n",
    "    image = image.unsqueeze(0)  # 添加批次维度\n",
    "    \n",
    "    with torch.no_grad():\n",
    "        output = model(image)\n",
    "    \n",
    "    # 打印预测的类别\n",
    "    predicted_class = output.argmax(1).item()\n",
    "    predicted_class_name = class_names[predicted_class]\n",
    "    print(f\"Image: {image_path}, Predicted class: {predicted_class_name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结\n",
    "\n",
    "通过本项目，深入了解PyTorch深度学习框架的使用，掌握CIFAR-10数据集的加载和预处理方法。学会如何设计和优化CNN模型，包括数据增强、激活函数的选择、损失函数与优化器的使用、学习率调整以及Dropout等优化技巧。此外，学会如何使用TensorBoard进行训练过程的可视化。最终，构建一个能够相对准确地对上述的10个类别的图像进行识别分类的模型。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pytorch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
